Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
null (Ed.)In this work, we first propose a parallel batch switching algorithm called Small-Batch Queue-Proportional Sampling (SB-QPS). Compared to other batch switching algorithms, SB-QPS significantly reduces the batch size without sacrificing the throughput performance and hence has much lower delay when traffic load is light to moderate. It also achieves the lowest possible time complexity of O(1) per matching computation per port, via parallelization. We then propose another algorithm called Sliding-Window QPS (SW-QPS). SW-QPS retains and enhances all benefits of SB-QPS, and reduces the batching delay to zero via a novel switching framework called sliding-window switching. In addition, SW-QPS computes matchings of much higher qualities, as measured by the resulting throughput and delay performances, than QPS-1, the state-of-the-art regular switching algorithm that builds upon the same underlying bipartite matching algorithm.more » « less
-
In an input-queued switch, a crossbar schedule, or a matching between the input ports and the output ports needs to be computed for each switching cycle, or time slot. It is a challenging research problem to design switching algorithms that produce high-quality matchings yet have a very low computational complexity when the switch has a large number of ports. Indeed, there appears to be a fundamental tradeoff between the computational complexity of the switching algorithm and the quality of the computed matchings. Parallel maximal matching algorithms (adapted for switching) appear to be a sweet tradeoff point in this regard. On one hand, they provide the following performance guarantees: Using maxi- mal matchings as crossbar schedules results in at least 50% switch throughput and order-optimal (i.e., independent of the switch size 𝑁 ) average delay bounds for various traffic arrival processes. On the other hand, their computational complexities can be as low as 𝑂 (log_2 𝑁) per port/processor, which is much lower than those of the algorithms for finding matchings of higher qualities such as maximum weighted matching. In this work, we propose QPS-r, a parallel iterative switching algorithm that has the lowest possible computational complexity: 𝑂(1) per port. Yet, the matchings that QPS-r computes have the same quality as maximal matchings in the following sense: Using such matchings as crossbar schedules results in exactly the same aforementioned provable throughput and delay guarantees as using maximal matchings, as we show using Lyapunov stability analysis. Although QPS-r builds upon an existing add-on technique called Queue-Proportional Sampling (QPS), we are the first to discover and prove this nice property of such matchings. We also demon- strate that QPS-3 (running 3 iterations) has comparable empirical throughput and delay performances as iSLIP (running log 𝑁 itera- 2 tions), a refined and optimized representative maximal matching algorithm adapted for switching.more » « less
-
The research problem of how to use a high-speed circuit switch, typically an optical switch, to most effectively boost the switching capacity of a datacenter network, has been extensively studied. In this work, we focus on a different but related research problem that arises when multiple (say $$s$$) parallel circuit switches are used: How to best split a switching workload $$D$$ into sub-workloads $$D_1, D_2, ..., D_s$$, and give them to the $$s$$ switches as their respective workloads, so that the overall makespan of the parallel switching system is minimized? Computing such an optimal split is unfortunately NP-hard, since the circuit/optical switch incurs a nontrivial reconfiguration delay when the switch configuration has to change. In this work, we formulate a weaker form of this problem: How to minimize the total number of nonzero entries in $$D_1, D_2, ..., D_s$$ (so that the overall reconfiguration cost can be kept low), under the constraint that every row or column sum of $$D$$ (which corresponds to the workload imposed on a sending or receiving rack respectively) is evenly split? Although this weaker problem is still NP-hard, we are able to design LESS, an approximation algorithm that has a low approximation ratio of only $$1+\epsilon$$ in practice and a low computational complexity of only $O(m^2)$, where $$m = \|D\|_0$$ is the number of nonzero entries in $$D$$. Our simulation studies show that LESS results in excellent overall makespan performances under realistic datacenter traffic workloads and parameter settings.more » « less
-
In an input-queued switch, a crossbar schedule, or a matching between the input ports and the output ports needs to be computed for each switching cycle, or time slot. It is a challenging research problem to design switching algorithms that produce high-quality matchings yet have a very low computational complexity when the switch has a large number of ports. Indeed, there appears to be a fundamental tradeoff between the computational complexity of the switching algorithm and the quality of the computed matchings. Parallel maximal matching algorithms (adapted for switching) appear to be a sweet tradeoff point in this regard. On one hand, they provide the following performance guarantees: Using maxi- mal matchings as crossbar schedules results in at least 50% switch throughput and order-optimal (i.e., independent of the switch size 𝑁 ) average delay bounds for various traffic arrival processes. On the other hand, their computational complexities can be as low as 𝑂 (log2 𝑁 ) per port/processor, which is much lower than those of the algorithms for finding matchings of higher qualities such as maximum weighted matching. In this work, we propose QPS-r, a parallel iterative switching algorithm that has the lowest possible computational complexity: 𝑂(1) per port. Yet, the matchings that QPS-r computes have the same quality as maximal matchings in the following sense: Using such matchings as crossbar schedules results in exactly the same aforementioned provable throughput and delay guarantees as using maximal matchings, as we show using Lyapunov stability analysis. Although QPS-r builds upon an existing add-on technique called Queue-Proportional Sampling (QPS), we are the first to discover and prove this nice property of such matchings. We also demon- strate that QPS-3 (running 3 iterations) has comparable empirical throughput and delay performances as iSLIP (running log 𝑁 itera- 2 tions), a refined and optimized representative maximal matching algorithm adapted for switching.more » « less
An official website of the United States government
